Transformers have proved to be very effective for visual recognition tasks. In particular, vision transformers construct compressed global representations through self-attention and learnable class tokens. Multi-resolution transformers have shown recent successes in semantic segmentation but can only capture local interactions in high-resolution feature maps. This paper extends the notion of global tokens to build GLobal Attention Multi-resolution (GLAM) transformers. GLAM is a generic module that can be integrated into most existing transformer backbones. GLAM includes learnable global tokens, which unlike previous methods can model interactions between all image regions, and extracts powerful representations during training. Extensive experiments show that GLAM-Swin or GLAM-Swin-UNet exhibit substantially better performances than their vanilla counterparts on ADE20K and Cityscapes. Moreover, GLAM can be used to segment large 3D medical images, and GLAM-nnFormer achieves new state-of-the-art performance on the BCV dataset.
In this paper, we investigate the problem of multi-domain translation: given an element $a$ of domain $A$, we would like to generate a corresponding $b$ sample in another domain $B$, and vice versa. Acquiring supervision in multiple domains can be a tedious task, also we propose to learn this translation from one domain to another when supervision is available as a pair $(a,b)\sim A\times B$ and leveraging possible unpaired data when only $a\sim A$ or only $b\sim B$ is available. We introduce a new unified framework called Latent Space Mapping (\model) that exploits the manifold assumption in order to learn, from each domain, a latent space. Unlike existing approaches, we propose to further regularize each latent space using available domains by learning each dependency between pairs of domains. We evaluate our approach in three tasks performing i) synthetic dataset with image translation, ii) real-world task of semantic segmentation for medical images, and iii) real-world task of facial landmark detection.
Commonly adopted in the manufacturing and aerospace sectors, digital twin (DT) platforms are increasingly seen as a promising paradigm to control, monitor, and analyze software-based, "open", communication systems. Notably, DT platforms provide a sandbox in which to test artificial intelligence (AI) solutions for communication systems, potentially reducing the need to collect data and test algorithms in the field, i.e., on the physical twin (PT). A key challenge in the deployment of DT systems is to ensure that virtual control optimization, monitoring, and analysis at the DT are safe and reliable, avoiding incorrect decisions caused by "model exploitation". To address this challenge, this paper presents a general Bayesian framework with the aim of quantifying and accounting for model uncertainty at the DT that is caused by limitations in the amount and quality of data available at the DT from the PT. In the proposed framework, the DT builds a Bayesian model of the communication system, which is leveraged to enable core DT functionalities such as control via multi-agent reinforcement learning (MARL), monitoring of the PT for anomaly detection, prediction, data-collection optimization, and counterfactual analysis. To exemplify the application of the proposed framework, we specifically investigate a case-study system encompassing multiple sensing devices that report to a common receiver. Experimental results validate the effectiveness of the proposed Bayesian framework as compared to standard frequentist model-based solutions.
大型变压器模型实现了自然语言理解任务的最新状态,并越来越成为建模源代码的基线模型体系结构。通常,变压器在大型无监督的语料库中进行预训练,学习令牌表示和与通常可用的文本相关的转换,然后对特定的下游感兴趣的任务进行微调。虽然微调是一种尝试将模型调整为新领域的久经考验的方法(例如,在给定主题上提出问题,概括仍然是一个持续的挑战。在本文中,我们探索并评估了变形金刚的模型以进行个性化。在为Java方法生成单元测试的背景下,我们评估学习以使用多种个性化技术为特定的软件项目个性化。我们考虑三种关键方法:(i)自定义微调,这允许调整所有模型参数; (ii)轻巧的微调,它冻结了大多数模型的参数,可以单独调整令牌嵌入和SoftMax层或单独的最终层; (iii)前缀调整,该调谐使模型参数冻结,但优化了小型项目特定的前缀矢量。这些技术中的每一个都提供了总计算成本和预测性能的权衡,我们通过代码和特定任务指标,培训时间和总计算操作进行评估。我们比较了这些微调策略以生成代码,并讨论了各种部署方案中每个策略的潜在概括和成本益处。
本文介绍了一个修改后的用户数据报协议(UDP),用于联合学习,以确保模型参数传输过程中的效率和可靠性,从而在每个联合学习回合中最大程度地发挥全局模型的潜力。在开发和测试此协议时,使用NS3模拟器来模拟通过网络的数据包传输,而Google TensorFlow用于创建自定义的联合学习环境。在此初步实现中,模拟包含三个节点,其中两个节点是客户端节点,一个是服务器节点。本文获得的结果提供了对未来联邦学习的协议能力的信心协议和修改后的UDP协议将进行模拟。还将探索修改后的UDP的优化,以提高效率,同时确保可靠性。
捕获图像的全局拓扑对于提出对其域的准确分割至关重要。但是,大多数现有的分割方法都不能保留给定输入的初始拓扑,这对许多下游基于对象的任务有害。对于大多数在本地尺度上工作的深度学习模型来说,这是更真实的。在本文中,我们提出了一种新的拓扑深度图像分割方法,该方法依赖于新的泄漏损失:Pathloss。我们的方法是Baloss [1]的扩展,其中我们希望改进泄漏检测,以更好地恢复图像分割的接近度。这种损失使我们能够正确定位并修复预测中可能发生的关键点(边界中的泄漏),并基于最短路径搜索算法。这样,损失最小化仅在必要时才能强制连接,并最终提供了图像中对象边界的良好定位。此外,根据我们的研究,与无需使用拓扑损失的方法相比,我们的Pathloss学会了保持更强的细长结构。通过我们的拓扑损失函数培训,我们的方法在两个代表性数据集上优于最先进的拓扑感知方法:电子显微镜和历史图。
图像检索通常以平均精度(AP)或召回@k进行评估。但是,这些指标仅限于二进制标签,并且不考虑错误的严重性。本文介绍了一种新的分层AP培训方法,用于相关图像检索(HAP-PIER)。 Happier是基于新的HAP度量,该指标利用概念层次结构来通过整合错误的重要性并更好地评估排名来完善AP。为了用HAP训练深层模型,我们仔细研究了问题的结构,并设计了平滑的下限替代物,并结合了聚类损失,以确保订购一致。在6个数据集上进行的广泛实验表明,更快乐的层次检索的最新方法明显优于最先进的方法,同时在评估细粒度排名表演时与最新方法相当。最后,我们表明更快乐地导致嵌入空间的更好组织,并防止最严重的非等级方法失败案例。我们的代码可在以下网址公开获取:。
